Skip to content

Added KNN nested vector search benchmark cases for Faiss + Lucene. #5518

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

0ctopus13prime
Copy link

@0ctopus13prime 0ctopus13prime commented May 14, 2025

Description

Installing benchmark trigger for KNN vector search nested case in Jenkins crontab.
The new benchmark will cover two engines, Lucene and Faiss, for 3.0 and 3.1.

Human Readable Crontab Parameters

3.0 Faiss nested cohere-1m

  H 5 * * *
  %BUNDLE_MANIFEST_URL=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/3.0.0/latest/linux/arm64/tar/dist/opensearch/manifest.yml;
  TEST_WORKLOAD=vectorsearch;
  SINGLE_NODE_CLUSTER=false;
  DATA_NODE_COUNT=3;
  DATA_INSTANCE_TYPE=r6g.2xlarge;
  USE_50_PERCENT_HEAP=true;
  USER_TAGS=run-type:nightly,segrep:disabled,arch:arm64,instance-type:r6g.2xlarge,major-version:2x,cluster-config:arm64-r6g.2xlarge-3-data-3-shards-1-replica-faiss-cohere-nested-1m;
  ADDITIONAL_CONFIG=knn.algo_param.index_thread_qty:2;
  WORKLOAD_PARAMS={
  "target_index_name": "target_index",
  "target_field_name": "nested_field.target_field",
  "target_index_body": "indices/nested/nested-faiss-index.json",
  "target_index_dimension": 768,
  "target_index_bulk_size": 100,
  "target_index_bulk_index_data_set_format": "hdf5",
  "target_index_bulk_index_data_set_corpus": "cohere-nested",
  "target_index_bulk_indexing_clients": 10,
  "target_index_max_num_segments": 1,
  "target_index_space_type": "innerproduct",
  "query_k": 5,
  "query_body": {
    "docvalue_fields": [
      "_id"
    ],
    "stored_fields": "_none_"
  },
  "query_data_set_format": "hdf5",
  "query_data_set_corpus": "cohere-nested",
  "query_count": 10000
};
  CAPTURE_NODE_STAT=true

3.1 Faiss nested cohere-1m

  H 5 * * *
  %BUNDLE_MANIFEST_URL=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/3.1.0/latest/linux/arm64/tar/dist/opensearch/manifest.yml;
  TEST_WORKLOAD=vectorsearch;
  SINGLE_NODE_CLUSTER=false;
  DATA_NODE_COUNT=3;
  DATA_INSTANCE_TYPE=r6g.2xlarge;
  USE_50_PERCENT_HEAP=true;
  USER_TAGS=run-type:nightly,segrep:disabled,arch:arm64,instance-type:r6g.2xlarge,major-version:2x,cluster-config:arm64-r6g.2xlarge-3-data-3-shards-1-replica-faiss-cohere-nested-1m;
  ADDITIONAL_CONFIG=knn.algo_param.index_thread_qty:2;
  WORKLOAD_PARAMS={
  "target_index_name": "target_index",
  "target_field_name": "nested_field.target_field",
  "target_index_body": "indices/nested/nested-faiss-index.json",
  "target_index_dimension": 768,
  "target_index_bulk_size": 100,
  "target_index_bulk_index_data_set_format": "hdf5",
  "target_index_bulk_index_data_set_corpus": "cohere-nested",
  "target_index_bulk_indexing_clients": 10,
  "target_index_max_num_segments": 1,
  "target_index_space_type": "innerproduct",
  "query_k": 5,
  "query_body": {
    "docvalue_fields": [
      "_id"
    ],
    "stored_fields": "_none_"
  },
  "query_data_set_format": "hdf5",
  "query_data_set_corpus": "cohere-nested",
  "query_count": 10000
};
  CAPTURE_NODE_STAT=true

3.0. Lucene nested cohere-1m

  H 5 * * *
  %BUNDLE_MANIFEST_URL=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/3.0.0/latest/linux/arm64/tar/dist/opensearch/manifest.yml;
  TEST_WORKLOAD=vectorsearch;
  SINGLE_NODE_CLUSTER=false;
  DATA_NODE_COUNT=3;
  DATA_INSTANCE_TYPE=r6g.2xlarge;
  USE_50_PERCENT_HEAP=true;
  USER_TAGS=run-type:nightly,segrep:disabled,arch:arm64,instance-type:r6g.2xlarge,major-version:2x,cluster-config:arm64-r6g.2xlarge-3-data-3-shards-1-replica-lucene-cohere-nested-1m;
  ADDITIONAL_CONFIG=knn.algo_param.index_thread_qty:2;
  WORKLOAD_PARAMS={
  "target_index_name": "target_index",
  "target_field_name": "nested_field.target_field",
  "target_index_body": "indices/nested/nested-lucene-index.json",
  "target_index_dimension": 768,
  "target_index_bulk_size": 100,
  "target_index_bulk_index_data_set_format": "hdf5",
  "target_index_bulk_index_data_set_corpus": "cohere-nested",
  "target_index_bulk_indexing_clients": 10,
  "target_index_max_num_segments": 1,
  "target_index_space_type": "innerproduct",
  "query_k": 5,
  "query_body": {
    "docvalue_fields": [
      "_id"
    ],
    "stored_fields": "_none_"
  },
  "query_data_set_format": "hdf5",
  "query_data_set_corpus": "cohere-nested",
  "query_count": 10000
};
  CAPTURE_NODE_STAT=true

3.1. Lucene nested cohere-1m

  H 5 * * *
  %BUNDLE_MANIFEST_URL=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/3.1.0/latest/linux/arm64/tar/dist/opensearch/manifest.yml;
  TEST_WORKLOAD=vectorsearch;
  SINGLE_NODE_CLUSTER=false;
  DATA_NODE_COUNT=3;
  DATA_INSTANCE_TYPE=r6g.2xlarge;
  USE_50_PERCENT_HEAP=true;
  USER_TAGS=run-type:nightly,segrep:disabled,arch:arm64,instance-type:r6g.2xlarge,major-version:2x,cluster-config:arm64-r6g.2xlarge-3-data-3-shards-1-replica-lucene-cohere-nested-1m;
  ADDITIONAL_CONFIG=knn.algo_param.index_thread_qty:2;
  WORKLOAD_PARAMS={
  "target_index_name": "target_index",
  "target_field_name": "nested_field.target_field",
  "target_index_body": "indices/nested/nested-lucene-index.json",
  "target_index_dimension": 768,
  "target_index_bulk_size": 100,
  "target_index_bulk_index_data_set_format": "hdf5",
  "target_index_bulk_index_data_set_corpus": "cohere-nested",
  "target_index_bulk_indexing_clients": 10,
  "target_index_max_num_segments": 1,
  "target_index_space_type": "innerproduct",
  "query_k": 5,
  "query_body": {
    "docvalue_fields": [
      "_id"
    ],
    "stored_fields": "_none_"
  },
  "query_data_set_format": "hdf5",
  "query_data_set_corpus": "cohere-nested",
  "query_count": 10000
};
  CAPTURE_NODE_STAT=true

Issues Resolved

N/A

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@0ctopus13prime
Copy link
Author

0ctopus13prime commented May 14, 2025

Not sure why groovy tests are failing, can anyone tell what is causing it to fail? 😅

@gaiksaya
Copy link
Member

Not sure why groovy tests are failing, can anyone tell what is causing it to fail? 😅

Please see https://github.com/opensearch-project/opensearch-build/blob/main/DEVELOPER_GUIDE.md#regression-tests
Run the gradle command to update the regression files that tracks what is added in the related jenkinsfile

@0ctopus13prime 0ctopus13prime force-pushed the vector-search-nested-support-dashboard branch from caabc9f to 2e52a1c Compare May 15, 2025 03:34
@0ctopus13prime
Copy link
Author

@gaiksaya Thank you, I've run the regression test. Could you take a look at this PR again?

@gaiksaya
Copy link
Member

@gaiksaya Thank you, I've run the regression test. Could you take a look at this PR again?

Will leave to @rishabh6788 for final approval as he has more idea about configs and resources for running these benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

2 participants